Goto

Collaborating Authors

 post-training quantization


Q-VLM: Post-training Quantization for Large Vision-Language Models

Neural Information Processing Systems

In this paper, we propose a post-training quantization framework of large vision-language models (L VLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by minimizing activation discretization errors, which fails to acquire optimal quantization strategy without considering cross-layer dependency.






SupplementaryMaterial

Neural Information Processing Systems

The results ofAddNN are basically consistent with the results in [1]. The accuracy drops after post-training quantization are reported in Table4.



PTQD: Accurate Post-Training Quantization for Diffusion Models Y efei He

Neural Information Processing Systems

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications.


FP8 Quantization: The Power of the Exponent Andrey Kuzmin, Mart V an Baalen

Neural Information Processing Systems

Neural network quantization is one of the most effective ways to improve the efficiency of neural networks. Quantization allows weights and activations to be represented in low bit-width formats, e.g. 8 bit integers (INT8).